Knowledge tracing (KT) aims to leverage students' learning histories to estimate their mastery levels on a set of pre-defined skills, based on which the corresponding future performance can be accurately predicted. In practice, a student's learning history comprises answers to sets of massed questions, each known as a session, rather than merely being a sequence of independent answers. Theoretically, within and across these sessions, students' learning dynamics can be very different. Therefore, how to effectively model the dynamics of students' knowledge states within and across the sessions is crucial for handling the KT problem. Most existing KT models treat student's learning records as a single continuing sequence, without capturing the sessional shift of students' knowledge state. To address the above issue, we propose a novel hierarchical transformer model, named HiTSKT, comprises an interaction(-level) encoder to capture the knowledge a student acquires within a session, and a session(-level) encoder to summarise acquired knowledge across the past sessions. To predict an interaction in the current session, a knowledge retriever integrates the summarised past-session knowledge with the previous interactions' information into proper knowledge representations. These representations are then used to compute the student's current knowledge state. Additionally, to model the student's long-term forgetting behaviour across the sessions, a power-law-decay attention mechanism is designed and deployed in the session encoder, allowing it to emphasize more on the recent sessions. Extensive experiments on three public datasets demonstrate that HiTSKT achieves new state-of-the-art performance on all the datasets compared with six state-of-the-art KT models.
translated by 谷歌翻译
开放式识别使深度神经网络(DNN)能够识别未知类别的样本,同时在已知类别的样本上保持高分类精度。基于自动编码器(AE)和原型学习的现有方法在处理这项具有挑战性的任务方面具有巨大的潜力。在这项研究中,我们提出了一种新的方法,称为类别特定的语义重建(CSSR),该方法整合了AE和原型学习的力量。具体而言,CSSR用特定于类的AE表示的歧管替代了原型点。与传统的基于原型的方法不同,CSSR在单个AE歧管上的每个已知类模型,并通过AE的重建误差来测量类归属感。特定于类的AE被插入DNN主链的顶部,并重建DNN而不是原始图像所学的语义表示。通过端到端的学习,DNN和AES互相促进,以学习歧视性和代表性信息。在多个数据集上进行的实验结果表明,所提出的方法在封闭式和开放式识别中都达到了出色的性能,并且非常简单且灵活地将其纳入现有框架中。
translated by 谷歌翻译
通过提取和利用来自异构信息网络(HIN)的高阶信息的提取和利用模拟异质性,近年来一直在吸引巨大的研究关注。这种异构网络嵌入(HNE)方法有效地利用小规模旋流的异质性。然而,在现实世界中,随着新节点和不同类型的链路的连续引入,何种素数量呈指数级增长,使其成为十亿尺度的网络。在这种关链接上的学习节点嵌入式为现有的HNE方法进行了性能瓶颈,这些方法通常是集中的,即完成数据,并且模型都在单机上。为了满足强大的效率和有效性保障的大型HNE任务,我们呈现\纺织{分散嵌入框架的异构信息网络}(Dehin)。在Dehin中,我们生成一个分布式并行管道,它利用超图来注入到HNE任务中的并行化。 Dehin呈现了一种上下文保留的分区机制,可创新地将大HIN作为超图制定,其超高频连接语义相似的节点。我们的框架然后采用分散的策略来通过采用类似的树形管道来有效地分隔帖。然后,每个结果的子网被分配给分布式工作人员,该工作者采用深度信息最大化定理,从其接收的分区本地学习节点嵌入。我们进一步设计了一种新颖的嵌入对准方案,将独立学习的节点嵌入从所有子网嵌入到公共向量空间上的新颖嵌入对准方案,从而允许下游任务等链路预测和节点分类。
translated by 谷歌翻译
Learning with noisy labels is a vital topic for practical deep learning as models should be robust to noisy open-world datasets in the wild. The state-of-the-art noisy label learning approach JoCoR fails when faced with a large ratio of noisy labels. Moreover, selecting small-loss samples can also cause error accumulation as once the noisy samples are mistakenly selected as small-loss samples, they are more likely to be selected again. In this paper, we try to deal with error accumulation in noisy label learning from both model and data perspectives. We introduce mean point ensemble to utilize a more robust loss function and more information from unselected samples to reduce error accumulation from the model perspective. Furthermore, as the flip images have the same semantic meaning as the original images, we select small-loss samples according to the loss values of flip images instead of the original ones to reduce error accumulation from the data perspective. Extensive experiments on CIFAR-10, CIFAR-100, and large-scale Clothing1M show that our method outperforms state-of-the-art noisy label learning methods with different levels of label noise. Our method can also be seamlessly combined with other noisy label learning methods to further improve their performance and generalize well to other tasks. The code is available in https://github.com/zyh-uaiaaaa/MDA-noisy-label-learning.
translated by 谷歌翻译
Deep learning, especially convolutional neural networks, has triggered accelerated advancements in computer vision, bringing changes into our daily practice. Furthermore, the standardized deep learning modules (also known as backbone networks), i.e., ResNet and EfficientNet, have enabled efficient and rapid development of new computer vision solutions. Yet, deep learning methods still suffer from several drawbacks. One of the most concerning problems is the high memory and computational cost, such that dedicated computing units, typically GPUs, have to be used for training and development. Therefore, in this paper, we propose a quantifiable evaluation method, the convolutional kernel redundancy measure, which is based on perceived image differences, for guiding the network structure simplification. When applying our method to the chest X-ray image classification problem with ResNet, our method can maintain the performance of the network and reduce the number of parameters from over $23$ million to approximately $128$ thousand (reducing $99.46\%$ of the parameters).
translated by 谷歌翻译
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications. However, the enormous size of large-scale graphs hinders their applications under real-time inference scenarios. Although existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure, these methods still suffer from scalability issues when making inferences on unseen nodes, as the feature preprocessing requires the graph is known and fixed. To speed up the inference in the inductive setting, we propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information. This could successfully avoid the redundant computation of feature propagation. Moreover, the trade-off between accuracy and inference latency can be flexibly controlled by simple hyper-parameters to match different latency constraints of application scenarios. To compensate for the potential inference accuracy loss, we further propose Inception Distillation to exploit the multi scale reception information and improve the inference performance. Extensive experiments are conducted on four public datasets with different scales and characteristics, and the experimental results show that our proposed inference acceleration framework outperforms the SOTA graph inference acceleration baselines in terms of both accuracy and efficiency. In particular, the advantage of our proposed method is more significant on larger-scale datasets, and our framework achieves $75\times$ inference speedup on the largest Ogbn-products dataset.
translated by 谷歌翻译
Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.
translated by 谷歌翻译
在线知识蒸馏(OKD)通过相互利用教师和学生之间的差异来改善所涉及的模型。它们之间的差距上有几个关键的瓶颈 - 例如,为什么以及何时以及何时损害表现,尤其是对学生的表现?如何量化教师和学生之间的差距? - 接受了有限的正式研究。在本文中,我们提出了可切换的在线知识蒸馏(Switokd),以回答这些问题。 Switokd的核心思想不是专注于测试阶段的准确性差距,而是通过两种模式之间的切换策略来适应训练阶段的差距,即蒸馏差距 - 专家模式(暂停老师,同时暂停教师保持学生学习)和学习模式(重新启动老师)。为了拥有适当的蒸馏差距,我们进一步设计了一个自适应开关阈值,该阈值提供了有关何时切换到学习模式或专家模式的正式标准,从而改善了学生的表现。同时,老师从我们的自适应切换阈值中受益,并基本上与其他在线艺术保持同步。我们进一步将Switokd扩展到具有两个基础拓扑的多个网络。最后,广泛的实验和分析验证了Switokd在最新面前的分类的优点。我们的代码可在https://github.com/hfutqian/switokd上找到。
translated by 谷歌翻译
数十年来,计算机系统持有大量个人数据。一方面,这种数据丰度允许在人工智能(AI),尤其是机器学习(ML)模型中突破。另一方面,它可能威胁用户的隐私并削弱人类与人工智能之间的信任。最近的法规要求,可以从一般情况下从计算机系统中删除有关用户的私人信息,特别是根据要求从ML模型中删除(例如,“被遗忘的权利”)。虽然从后端数据库中删除数据应该很简单,但在AI上下文中,它不够,因为ML模型经常“记住”旧数据。现有的对抗攻击证明,我们可以从训练有素的模型中学习私人会员或培训数据的属性。这种现象要求采用新的范式,即机器学习,以使ML模型忘记了特定的数据。事实证明,由于缺乏共同的框架和资源,最近在机器上学习的工作无法完全解决问题。在本调查文件中,我们试图在其定义,场景,机制和应用中对机器进行彻底的研究。具体而言,作为最先进的研究的类别集合,我们希望为那些寻求机器未学习的入门及其各种表述,设计要求,删除请求,算法和用途的人提供广泛的参考。 ML申请。此外,我们希望概述范式中的关键发现和趋势,并突出显示尚未看到机器无法使用的新研究领域,但仍可以受益匪浅。我们希望这项调查为ML研究人员以及寻求创新隐私技术的研究人员提供宝贵的参考。我们的资源是在https://github.com/tamlhp/awesome-machine-unlearning上。
translated by 谷歌翻译
多模式知识图(MKG)不仅包括关系三重态,还包括相关的多模式辅助数据(即文本和图像),从而增强了知识的多样性。然而,自然的不完整严重阻碍了MKG的应用。为了解决该问题,现有研究采用基于嵌入的推理模型来融合多模式特征后推断缺失的知识。但是,由于以下问题,这些方法的推理性能受到限制:(1)多模式辅助特征的无效融合; (2)缺乏复杂的推理能力以及无法进行多跳的推理,该推理能够推断出更多的知识。为了克服这些问题,我们提出了一个名为MMKGR(多模式知识图推理)的新型模型。具体而言,该模型包含以下两个组件:(1)统一的栅极注意网络,旨在通过充分的注意力相互作用和降低噪声来生成有效的多模式互补特征; (2)一种补充特征感知的增强学习方法,该方法根据组件(1)中获得的特征,通过执行多跳的推理过程来预测丢失元素。实验结果表明,MMKGR在MKG推理任务中的最新方法优于最先进的方法。
translated by 谷歌翻译